Implement first version of JMH microbenchmarks #18

CsengerG · 2024-05-01T11:05:53Z

Description of changes:

Now that we have a functional (correct but slow) implementation of S3SeekableStream, it makes sense to put monitoring over performance.

This PR implements basic microbenchmarks that test full sequential read, forward seeks, backward seeks and a Parquet-like ("jumping around") pattern. For now, we only compare against the performance of a single standard (=non-CRT) S3 async client. Using these benchmarks we can start implementing optimisations and get relatively quick feedback of what (if anything) they improved.

To run the microbenchmarks one has to assume AWS credentials and specify two environment variables (a BUCKET and a PREFIX). We include a generator utility so that setup is easy and this can later by open-sourced. The README is updated with instructions about running these benchmarks.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

An example output of a run looks like this:

Benchmark                                                                      (key)  Mode  Cnt   Score   Error  Units
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream           random-1mb.txt    ss   15   0.886 ± 0.033   s/op
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream           random-4mb.txt    ss   15   3.024 ± 0.045   s/op
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream          random-16mb.txt    ss   15  10.806 ± 0.037   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient      random-1mb.txt    ss   15   0.107 ± 0.006   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient      random-4mb.txt    ss   15   0.146 ± 0.012   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient     random-16mb.txt    ss   15   0.292 ± 0.020   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream            random-1mb.txt    ss   15   0.802 ± 0.047   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream            random-4mb.txt    ss   15   2.885 ± 0.029   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream           random-16mb.txt    ss   15  11.296 ± 0.043   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient       random-1mb.txt    ss   15   0.108 ± 0.011   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient       random-4mb.txt    ss   15   0.147 ± 0.016   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient      random-16mb.txt    ss   15   0.292 ± 0.013   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream         random-1mb.txt    ss   15   0.880 ± 0.040   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream         random-4mb.txt    ss   15   2.992 ± 0.038   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream        random-16mb.txt    ss   15  10.897 ± 0.107   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient    random-1mb.txt    ss   15   0.102 ± 0.022   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient    random-4mb.txt    ss   15   0.133 ± 0.013   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient   random-16mb.txt    ss   15   0.280 ± 0.016   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream        random-1mb.txt    ss   15   1.427 ± 0.040   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream        random-4mb.txt    ss   15   5.276 ± 0.057   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream       random-16mb.txt    ss   15  21.818 ± 0.173   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient   random-1mb.txt    ss   15   0.086 ± 0.024   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient   random-4mb.txt    ss   15   0.147 ± 0.015   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient  random-16mb.txt    ss   15   0.430 ± 0.028   s/op

radhisat · 2024-05-02T16:22:17Z

input-stream/src/jmh/java/com/amazon/connector/s3/datagen/BenchmarkData.java

+  public static final List<BenchmarkObject> BENCHMARK_OBJECTS =
+      ImmutableList.of(
+          BenchmarkObject.builder()
+              .keyName("random-1mb.txt")


NIT: can we have these key names defined as constants? They seem to be accessed from multiple files

Yep, I don't know how I didn't notice this :( I will address this in a next PR.

radhisat · 2024-05-02T16:22:55Z

README.md

+Just run `./gradlew jmh --rerun`. (The reason for re-run is a Gradle-quirk. You may want to re-run benchmarks even when
+you did not actually change the source of your project: `--rerun` turns off the Gradle optimisation that falls through
+build steps when nothing changed.)
+


can we have some script which can take bucket name/prefix as an command line argument and decide to create the bucket and generate the data if it does not exist and run the benchmark as well?

That's a great suggestion. It will be very useful for new contributors too once this gets open sourced. For now, I will try to not block on it so created a backlog item for this.

CsengerG added 5 commits April 30, 2024 19:54

Add sequential read microbenchmarks

acc2b49

Make the shape of objects and access patterns precise

b01f8cf

wrap up first version of microbenchmarks

e8dbda7

update README

9f62de4

apply spotless

ad0d96f

CsengerG requested review from ahmarsuhail and oleg-lvovitch-aws May 1, 2024 13:00

radhisat reviewed May 2, 2024

View reviewed changes

radhisat approved these changes May 2, 2024

View reviewed changes

CsengerG merged commit 56c7470 into awslabs:main May 3, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement first version of JMH microbenchmarks #18

Implement first version of JMH microbenchmarks #18

CsengerG commented May 1, 2024 •

edited

Loading

radhisat May 2, 2024

CsengerG May 3, 2024

radhisat May 2, 2024

CsengerG May 3, 2024

Implement first version of JMH microbenchmarks #18

Implement first version of JMH microbenchmarks #18

Conversation

CsengerG commented May 1, 2024 • edited Loading

radhisat May 2, 2024

Choose a reason for hiding this comment

CsengerG May 3, 2024

Choose a reason for hiding this comment

radhisat May 2, 2024

Choose a reason for hiding this comment

CsengerG May 3, 2024

Choose a reason for hiding this comment

CsengerG commented May 1, 2024 •

edited

Loading